## **Practice 0416**

- 1. Cray C90 有两个 lane,但是在任何同一个功能单元上执行的两个向量指令之间需要 4 个周期的 dead time,即使他们之间没有数据依赖。对于最大向量长度为 128 的情形而言,由 dead time 导致的峰值性能的减少是多少?如果 lane的数目增加到 16,那性能折扣又是多少?
- 2. Consider the execution of the following loop, which increments each element of an integer array, on a two-issue processor, once without speculation and once with speculation:

```
Loop: LD R2,0(R1) ; R2=array element

DADDIU R2,R2,#1 ; increment R2

SD R2,0(R1) ;store result

DADDIU R1,R1,#8 ;increment pointer

BNE R2,R3,LOOP ;branch if not last element
```

Assume that there are separate integer functional units for effective address calculation. Assume that up to two instructions of any type can commit per clock. Calculate clock cycle number for the the first three iterations of this loop for both processors.

3. 简述instruction level parallelism, thread-level parallelism, and datal parallelism的含义。(用 尽量少的文字描述清楚)